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Abstract — Since the early days of digital communication, hid- 
den Markov models (HMMs) have now been also routinely used 
in speech recognition, processing of natural languages, images, 
and in bioinformatics. In an HMM (Xi, Yi)j>i, observations 
Xi , X2 , ■ ■ ■ are assumed to be conditionally independent given 
an "explanatory" Markov process Y%, Y2, . . ., which itself is not 
observed; moreover, the conditional distribution of Xi depends 
solely on Fj. Central to the theory and applications of HMM 
is the Viterbi algorithm to find a maximum a posteriori (MAP) 
estimate <ji :n = (51,92, ••• , q n ) of Y\. n given observed data xx-.„. 
Maximum a posteriori paths are also known as Viterbi paths 
or alignments. Recently, attempts have been made to study the 
behavior of Viterbi alignments when n — > 00. Thus, it has been 
shown that in some special cases a well-defined limiting Viterbi 
alignment exists. While innovative, these attempts have relied 
on rather strong assumptions and involved proofs which are 
existential. This work proves the existence of infinite Viterbi 
alignments in a more constructive manner and for a very general 
class of HMMs. 

Index Terms — Asymptotic, HMM, maximum a posteriori path, 
Viterbi algorithm, Viterbi extraction, Viterbi training. 



I. Introduction 

LET Y = (K;);>i be a Markov chain with state space 
S = {1, ...,K}, K > 1, and transition matrix P = 
(Pij)hjeS- Suppose that Y is irreducible and aperiodic, hence 
a unique stationary distribution n = ttP exists; suppose further 
that Yi ~ 7r from time i = 1. To every state I E S, let us assign 
an emission distribution Pi on (X, B), where X = M. D , the D- 
dimensional Euclidean space. Let fi be the density of Pi with 
respect to a suitable reference measure A on (X,B). Most 
commonly, A is either the Lebesgue measure (continuously 
distributed Xi) or the counting measure (discretely distributed 
Xi). 

Definition 1.1: The stochastic process (X,Y) is a hidden 
Markov model if there is a (measurable) function h such that 
for each n, X n = h(Y n ,e n ), where ei,e2, ... are i.i.d. and 
independent of Y. 

Hence, the emission distribution Pi is the distribution of 
h(l,e n ). The distribution of X is completely determined by 
P and the emission distributions Pi, I E S. It can be shown 
that X is also ergodic [1], [2], [3]. Let x\- n = (xi, . . . ,x n ) 
and yi.n = (yi,...,y n ) be fixed observed and unobserved 
realizations, respectively, of HMM (X,,li)j>i up to time n. 
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Treating yi-„ as parameters to be estimated, let A(qi :n ; x\ :n ) 
be the likelihood function P(Yi :n = qi :n ) IliLi fli ( Xi > 
of qi :n , and let V{x\- n ) be the set of the maximum-likelihood 
estimates v(xi :n ) E S n of yi :n - The elements of V{x\-n) are 
called (Viterbi) alignments and are commonly computed by the 
Viterbi algorithm [4], [5]. If P(Yi ; „ = qv.n) is thought of as 
the prior distribution of Y\- n , then u(a;i :rt )'s also maximize the 
probability mass function of the posterior distribution of Y, 
hence the term maximum a posteriori (MAP) paths. Besides 
their direct significance for prediction of Y from X, Viterbi 
alignments, or MAP paths, are also central to the theory 
and applications of HMMs [6] in the more general setting 
when any parameters of the emission distributions Pi and 
any of the transition probabilities pij, i,j E S, would also 
be unknown and of interest. Therefore, asymptotic behavior 
of Viterbi alignments is also crucial for the inference on the 
unknown parameters [6], [7]. 

To appreciate that the question of extending v(xi :n ) ad 
infinitum is not a trivial one even if the problem of non- 
uniqueness of v(xi-n) is disregarded, suffice it to say that 
an additional observation x„+i can in principle change the 
entire alignment based on x\. n , i.e. v(xi :n ) and v(xi :n +i)i:n 
can disagree significantly, if not fully. Fortunately, the sit- 
uation is not hopeless and in this paper we prove that 
in most HMMs alignments can be consistently extended 
piecewise. Specifically, motifs of (contiguous) observations 
Z\-b, called barriers, are observed with positive probability, 
forcing Viterbi alignments based on extended observations 
(xi m ,Zi;b,Zn+b+i;n+b+r% n > 0, t > 1, to stabilize as 
follows: Roughly, v{xv. n Zi:bX n +b+i:n+b+r)i:n = v(xi :n ) for 
all Xi; n and all extensions o; n +&+i :n +&+ r . To be more precise, 
a particular state I £ S and an element bk, called a node, of the 
barrier b can be found such that regardless of the observations 
before and after b, the alignment has to go through I at 
time u = n + k. The optimality principle then insures the 
stabilization v(xi :n zi [b x n+b+ i:n+b+r)i:u = v(xi; U ) and in 
particular v u = I. 

Suppose now that xi-, n contains several barriers with nodes 
occurring at times u\ < ■ ■ ■ < u m < n. Then the Viterbi 
alignment v{x\- n ) can be constructed piecewise as follows: Let 



l ,f m+1 ), where v 1 is the alignment 



based on xi- Ul and ending in I, and let v % , for i = 2, 3, . . . , m+ 
1, be the conditional alignment based on x Ui _ 1 - Ui given that 
Yui-i = h note that the alignments v l , i = 2,3, ... , to also 
end in I. Now, if a new observation x n +i is added, then the last 
segment v m+1 can change, but the segments v 1 ,. . . ,v m are 
intact. Suppose now that a realization Xi :oo contains infinitely 
many barriers, and hence also infinitely many nodes. Then the 
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(piecewise) infinite alignment v(xi :oc ) is defined naturally as 

the infinite succession of the segments v 1 , v 2 , 

In this paper, we prove that for some fixed integer AI > 0, 
the probability that the finite random process X\ : m generates a 
barrier, is positive. Since X is ergodic, almost every realization 
xi-oo has infinitely many barriers and, therefore, the infinite 
piecewise alignment is well-defined. Apparently, the piecewise 
alignment gives rise to a decoding process v : X°° i— > S°° via 
Vi-.oo = ^(^i:oo)i which we shall call the Viterbi alignment 
process. The construction ensures that V is regenerative and 
ergodic. Note also how this piecewise construction naturally 
calls for a buffered on-line implementation in which the 
memory used to store x Ui _ i:Ui can be released once v % has 
been computed. 

A. Previous related work and contribution of this work 

The problem of constructing infinite Viterbi processes has 
been brought to the attention of the IEEE Information Theory 
community fairly recently by [8] and [9]. Although the piece- 
wise structure of Viterbi alignments was already acknowledged 
in [10], to our best knowledge, the subject has been first 
seriously considered in [8], [9]. In these latter works, the 
existence of infinite alignments for certain special cases, such 
as K = 2 and Markov chains with additive white Gaussian 
noise, has been proved. In particular, in these cases the authors 
of [8], [9] have proved the existence of 'meeting times' and 
'meeting states', which are a special (stronger) type of nodes. 
While innovative, the main result of [8] (Theorem 2) makes 
several restrictive assumptions and is proved in an existential 
manner, which prevents its extension beyond the K = 2 case. 

Independently of these works, [11], [7], [12] have developed 
a more general theory to include the problem of estimating 
unknown parameters (6i, and pij, i,j £ S). Namely, the focus 
of this theory has been the Viterbi training (VT), or extraction, 
algorithm [13]. Competing with EM-based procedures, this 
algorithm provides computationally and intuitively appealing 
estimates which, on the other hand, are biased, even in the 
limit when n —* oo. In order to reduce this bias, the adjusted 
Viterbi training (VA) has been introduced in [11], [7], [12]. 
Naturally, VA relies on the existence of infinite alignments 
and their ergodic properties. Although the general theory has 
been presented in [12], [7], some of the main results of the 
theory (Lemma 3.1 and 3.2 of [7]) have appeared without 
proof due to the limitations of scope and size. This paper 
slightly refines these results and, most importantly, presents 
their complete proofs. Whereas these results are formulated for 
general HMMs (K > 2), [14] has most recently considered in 
full detail the special case of K = 2, generalizing similar 
results of [8], [9]. Specifically, it has been proved in [14] 
that infinitely many barriers (and hence the infinite Viterbi 
alignment) exist for any aperiodic and irreducible 2-state 
HMM. Thus, the results presented here generalize the ones of 
[14] and [8], [9] for K > 2. It turns out that this generalization 
is far from being straightforward and requires a more advanced 
analysis and tools. Furthermore, as we show below, when 
K > 2, not every aperiodic and irreducible HMM has 
infinitely many nodes, undermining the piecewise construction 



of infinite alignments for those models. The disappearance 
of nodes is due to the fact that an aperiodic and irreducible 
Markov chain can have zeros in the transition matrix. If this 
possibility is excluded, as is the case in [8], [9], the 'meeting 
times' and 'meeting states' of [8], [9] are sufficient to prove 
the existence of infinite Viterbi alignments for many HMMs 
used in practice. In their recent communication with us, the 
authors of [8], [9] have corrected those statements in their 
above works where the strict positivity of the transition matrix 
is implicitly assumed but formally omitted (see [7] for details). 
At the same time, in order to accommodate for zeros in the 
transition matrix, [7] introduced a more general notion of 
nodes, effectively removing the limitations of the notion of 
'meeting times' and 'meeting states'. However, the price for 
this generalization has been rather high due to the interfering 
issue of non-uniqueness of (finite) Viterbi alignments. For a 
detailed treatment of the piecewise construction of the infinite 
alignment and process in general HMMs, and the role of the 
infinite Viterbi process for the adjusted Viterbi training theory, 
we refer to the state-of-the-art article [7]. 

B. Organization of the rest of the paper 

In §11 we briefly outline the construction of the infinite 
alignments §II-B based on [7]. This includes definitions of 
nodes §II-A and barriers §H-C. Next, §111 states our main 
results which have first appeared in [7] and guarantee the 
existence of the alignment process V . In §HI-B, we give 
a counterexample to explain the necessity of our technical 
assumptions. In §IV, we present a complete and detailed 
proof of our main results. This is followed in §V by a brief 
discussion of the significance of the presented results. 

II. Construction 

A. Nodes 

First, consider the scores 

S u (l) d = max A((q,l);xi:u)- (1) 

Thus, S u (l) is the maximum of the likelihood of the paths 
terminating at u in state I. Note that 8\(l) = ^ifi{xi) and the 
recursion below 

<Wi(j) = ™x.{8 u {l)pij)fj{x u +i) V u > l,Vj e s, 

helps to verify that V(xi :n ), the set of all the Viterbi 
alignments, can be written as follows: V(xi- n ) = 
{v e S n : VieS, S n (v n ) > S n (i) and 
Vw : 1 < u < n, v u G t(u,v u +i)}, where Vu > l,Vj G S, 

t(u,j) = {l<ES:VieS S u (l) Plj > S u (i) Pij }. (2) 

Next, we introduce p^\u), the maximum of the likeli- 
hood realized along the paths connecting states i and j 
at times u and u + r, respectively. Thus, p^\u) = 
Pij and Vu > 1, and Vr > 1, let pvO (u) = 

maxq 1:rGS r p iqi f qi { )P<12 

fq r (Xu+r)Pq r j- (3) 
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Note also 

<Wi(i) = ^^u-ri^P^iu - r)}fj(x u+ i) Vr < u, 

Pij( U ) = VggPiq ( U )f<l( X u+r)Pqj- (4) 

Definition 2.1: Let < r < n, u < n . — r and let I g S. 
Given xi- u+r , the first u + r observations, x u is said to be an 
l-node of order r if 

$u(l)p%\u)>5 u (i) P V(u) ViJeS. (5) 

Also, x u is said to be a node of order r if it is an Z-node of 
order r for some I £ S; x u is said to be a strong node of order 
r if the inequalities in (5) are strict for every i, j € S,i ^ I. 
Let xi-n be such that x Ui is an Z^-node of order r, 1 < i < k, 
for some k < n, and assume u^ + r < n and > + for 
all i = 1, 2, . . . , k — 1. Such nodes are said to be separated. 

B. Piecewise alignment 

Suppose a?i :n is such that for some Uj, rj, i = 1, 2, . . . , fc, 
ui + ri < u 2 + r 2 < • • • < Ufc + r fe < n, x Ui is an k- 
node of order r,. It follows then easily from the definition 
of the node that there exists a Viterbi alignment v(xi :n ) 6 
V(xi :rt ) that goes through Z, at (i.e. v Ui = U) for each 
% = 1, 2, . . . , k (see [7]). It is not difficult to verify that such 
v(xi :n ) can actually be computed as follows: Obtain v 1 , a 
path that is optimal among all those that end at u\ in l\. (Note 
that unless the order of the node x ui is 0, v 1 need not be in 
V(xi :tll ).) Given x ul+ i :U2 , continue on by taking v 2 to be a 
maximum likelihood path from li to l 2 . That is, v 2 maximizes 
the constrained likelihood under the initial distribution (pi ± .) 
and the constraint v 2 2 _ Ul = li- Now, (v 1 ,?; 2 ) maximizes the 
likelihood given X\. U2 over all paths ending with 1%. Similarly, 
we define the pieces v 3 ,...,v k . Finally, v k+1 is chosen to 
maximize the (unconstrained) likelihood given x Uk+i:n under 
the initial distribution (pi k .). 

The separated nodes assumption itj+i > Ui + r, 1 < i < k, 
is not restrictive at all since it is always possible to choose 
from any infinite sequence of nodes an infinite subsequence 
of separated ones. The reason for this requirement has to do 
with the non-uniqueness of alignments and is as follows. The 
fact that x Ui is an rth order ^-riode guarantees that when 
backtracking from w;+r down to Ui, ties (if any) can be broken 
in such a way that, regardless of the values of x Ui + r +i :n and 
how ties are broken in between n and Uj + r, the alignment 
goes through Z; at U{. At the same time, segment Uj, . . . , Ui + r 
is 'delicate', that is, unless x Ui is a strong node, breaking the 
ties arbitrarily within Ui,...,Ui + r can result in v Ui ^ 
Hence, when neither x Ui nor x Ui+1 is strong and it-s+i < in+r, 
breaking the ties in favor of x Ui can result in v Ui+1 ^ h+i. 
Clearly, such a pathological situation is impossible if r = 
and might also be rare in practice even for r > 0. 

'Note that if x u is a node of order r, it is then also a node of any order 
higher than r. Hence, the order of a node is defined to be the minimum such 
r. 



To formalize the piecewise construction, let 

W l (x Un ) d =i f {v e S n : v n = l 

A(u;xi :n ) > A(w;xu n ) Vw e S n : w„ = I}. 

V l (xi; n ) d = {v G V(xi-. n ) : v n = 1} be the set of 
maximizers of the constrained likelihood, and the subset of 
maximizers of the (unconstrained) likelihood, respectively, 
all elements of which go through I at n. Note that unlike 
W (xi;n)> V {x\- n ) might be empty. It can be shown that 
V l (x 1:n ) ^$^> V l (x 1:n ) = W l (x 1:n ). Also, let subscript the 
(I) in VV ( ™ (a;i :n ) and V^(x 1:n ) refer to (pu) te s being used as 
the initial distribution in place of ir. With these notations, the 
piecewise alignment is v — (v 1 , . . . , v k+1 ) € V(xi :n ), where 

v 1 eW h (x 1:Ul ), v k+1 e V { i k) (x Uk+1:Un ) 

v l 6Wj 1 ; i) (i„,_ 1+1: J, 2 < i < k. (6) 
Moreover, for i = 1,2, the partial paths w(i) d = 

(v 1 ,...,v i )eW h (x UUi ). 

If xi-oo has infinitely many (separated) nodes {x Uk }k>i 
then v{x\ :00 ), an infinite piecewise alignment based on 
the node times {uk(xi :oc )}k>i can be defined as follows: 
If the sets 1 \(x Ui _ 1 +i:u i )j i — 2, as well as 

V(i h )(x Ufc+ i :n ) and W h (ui,x 1:Ul ) are singletons, then (6) 
immediately defines a unique infinite alignment v(xi :oo ) = 
{v 1 (xi: Ul ),v 2 (x Ul +i :U2 ), . . .). Otherwise, ties must be broken. 
If we want our infinite alignment process V to be regen- 
erative (see [7]), a natural consistency condition must be 
imposed on rules to select unique v(xi :n ) from W' 1 (xi :Ul ) x 

W(M +1:« 2 ) X • • ■ X W ( ' z fc fc i) {X uk _ 1+1:uh ) X V {lk) (x Uk + 1:n ). 

In [7], resulting infinite alignments, as well as decoding 
v : X°° — > 5°° based on such alignments, are called proper. 
This condition is, perhaps, best understood by the follow- 
ing example. Suppose for some xi-.s € X 5 , Wh^xus) = 
{12211, 11211}, and suppose the tie is broken in favor of 
11211. Now, whenever WhJx'^) contains {1221, 1121}, we 
naturally require that 1221 not be selected. In particular, we 
select 1121 from W^^xi-a) = {1221,1121}. Subsequently, 
112 is selected from Cfiu) = {122,112}, and so on. 
It can be shown that a decoding by piecewise alignment (6) 
with ties broken in favor of min (or max) under the reverse 
lexicographic ordering of S n , n£N, is a proper decoding. 

Note also that we break ties locally, i.e. within individual 
intervals + 1, . . . , itj, i > 2, enclosed by adjacent nodes. 
This is in contrast to global ordering of V(xi :n ), such as 
the one in [8], [9]. Since a global order need not respect 
decomposition (6), it can fail to produce an infinite alignment 
going through infinitely many nodes unless the nodes are 
strong. 

C. Barriers 

Recall (Definition 2.1) that nodes of order r at time u are 
defined relative to the entire realization xi-. u +r- Thus, whether 
Xu is a node or not depends, in principle, on all observations 
up to x u . 

We show below that typically a block x\. k £ X k (k > r) 
can be found such that for any w > 1 and for any x[. w £ X w , 
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(w + k — r)th element of (x[. w ,x\. k ) is a node of order r 
(relative to {x 1 ^, x\. k )). Sequences x\. k that ensure existence 
of such persistent nodes are called barriers in [7]. Specifically, 

Definition 2.2: Given I G S, x\. k G X k is called an (strong) 
l-barrier of order r > and length k > 1 if, for any 
w > 1 and for every x[. w G X w , (xi. lu , x\. k ) is such that 
(a;' 1:U) , x\. k ) w+ k- r is an (strong) ^-node of order r. 

III. Existence 

A. Clusters and main results 
For each i £ S, let 

G,"{i6^:/,(x)>0}. 

Definition 3.1: We call a subset C C S a cluster if the 
following conditions are satisfied: 

minPifriiccGj) > 0, and maxPj(r\icaGi) = 0. 

Hence, a cluster is a maximal subset of states such that Gc = 
Cii^cGi, the intersection of the supports of the corresponding 
emission distributions, is 'detectable'. Distinct clusters need 
not be disjoint and a cluster can consist of a single state. In 
this latter case such a state is not hidden, since it is exposed by 
any observation it emits. When K = 2, S is the only cluster 
possible, since otherwise all observations would expose their 
states and the underlying Markov chain would cease to be 
hidden. In practice, many other HMMs have the entirety of S 
as their (necessarily unique) cluster. 

We now state the main results. For every state I G S, let 



p l = max.pji. 

j 



Lemma 3.1: Assume that for each state I 6 S, 



Pi 



x G X : fi(x)p*> m&xf t (x)p* 

1,1^1 



> 0. 



(7) 



(8) 



Moreover, assume that there exists a cluster C C S and 
a positive integer m such that the mth power of the sub- 
stochastic matrix Q = (pij)i,jec is strictly positive. Then, 
for some integers M and r, M > r > 0, there exist a set 
B = B 1 x • ■ • x B M C X M , an M-tuple of states q 1:M G S M 
and a state I G S, such that every x\ : m G B is an Z-barrier of 
order r (and length M), qi\j- r = I and 

P (X 1:M G B, Y X , M = qi-. M ) > 0. 

Lemma 3.1 implies that P(Xi : m G B) > 0. Also, since 
every element of B is a barrier of order r, the ergodicity 
of X therefore guarantees that almost every realization of X 
contains infinitely many l-barriers of order r. Hence, almost 
every realization of X also has infinitely many l-nodes of order 
r. 

In two state HMMs, S is the only cluster (otherwise the 
Markov chain would not be hidden), hence Q = P. The irre- 
ducibility and aperiodicity in this case imply strict positivity of 
P 2 . Thus, the only condition to be verified is (8), which in this 
case writes as Pi ({x e X : fi(x)pl > J2(x)p2}) > and 



P 2 ({x G X : h(x)p* > h(x)pl}) > 0. In [14], it is shown 
that in the case of two state HMMs, one of these two positivity 
conditions is always met, which, in fact, turns out to be 
sufficient for the existence of infinitely many strong barriers in 
this (A' = 2) case. Thus, any two state HMM with irreducible 
and aperiodic Y has infinitely many strong barriers. Lemma 
3.1 significantly generalizes this and associated results of [14]. 
The case K = 2 is special in several respects, hence the 
generalization is technically involved, and in particular the 
CLT-based proof of the existence of infinitely many nodes in 
[8] (Theorem 2) does not apply when K > 2. 

For certain technical reasons, instead of extracting subse- 
quences of separated nodes from general infinite sequences of 
nodes guaranteed by Lemma 3.1, we achieve node separation 
by adjusting the notion of barriers. Namely, note that two rth- 
order Z-barriers Xj-.j+M-i and Xi-.i+M-i might be in B with 
j < i < j + r > implying that the associated nodes Xj + m-r-i 
and are not separated. Thus, we impose on B the 

following condition: 

Xj..j +M -l,Xi:i + M-l G B, j =^ \l - j\ > T. (9) 

If (9) holds, we say that the barriers from B C X are 
separated. This is often easy to achieve by a simple extension 
of B as shown in the following example. Suppose there exists 
x G X such that x $ B rn , for all m = 1,2, ... ,M. All 
elements of B* = f {x} x B are evidently barriers, and 
moreover, they are now separated. The following Lemma 
incorporates a more general version of the above example. 



Lemma 3.2: Suppose the assumptions of Lemma 3.1 are 
satisfied. Then, for some integers M and r, M > r > 0, there 

qi-.M 



exist B = Bx X ■ ■ ■ x B M C X M , q 1:M G S M , and / G S, such 



that every x\. M G B is a separated ^-barrier of order r (and 



length M), q M - 7 
0. 



I, and P (X 1:M G B, Y 1:M = qv.Ai) > 



B. Counterexamples 

The condition on C in Lemma 3.1 might seem technical 
and even unnecessary. We next give an example of an HMM 
where the cluster condition is not met and no node (barrier) 
can occur. Then, we will modify the example to enforce the 
cluster condition and consequently gain barriers. 

Example 3.2: Let K = 4 and consider an ergodic Markov 
chain with transition matrix 



V 



Let the emission distributions be such that (8) is satisfied and 
G\ = G2 and G3 = G4 and G\ n G3 = 0. Hence, in this case 
there are two disjoint clusters C\ = {1,2}, C2 = {3,4}. The 
matrices Q.; corresponding to Cj, i = 1, 2 are 



Evidently, the cluster assumption of Lemma 3.1 is not satisfied. 
Note also that the alignment cannot change (in one step) 
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its state to the opposite one within the same cluster. Since 
the supports Gi :2 and G3 4 are disjoint, any observation 
exposes the corresponding cluster. Hence any sequence of 
observations can be regarded as a sequence of blocks emitted 
from alternating clusters. However, the alignment inside each 
block stays constant. It can be shown that in this case no x u 
can be a node (of any order) for any n > 1, x\- n € X n , and 

1 < u < n. 

Let us modify the HMM in Example 3.2 to ensure the 
assumptions of Lemma 3.1. 

Example 3.3: Let e be such that < e < \ and let us 
replace P by the following transition matrix 

(\~* e \\ 

* \-t 1 

I I ' 

V I i ) 

Let the emission distributions be as in the previous example. 
In this case, the cluster C\ satisfies the assumption of Lemma 
3.1. As previously, every observation exposes its cluster. 
Lemma 3.1 now applies to guarantee barriers and nodes. 
To be more specific, let e = 1/4, fi(x) = exp(— x) x >o, 
f 2 (x) = 2exp(-2x)^>o, and f 3 (x) = cyLp(x) x < , f±{x) = 

2 cxp(2x) x <Q- It can then be verified that if xi :2 = (1, 1) then 
x\ is a 1-node of order 2. Indeed, in that case any element 
of B = (0, +00) x (log(2), +00) x (0, +00) is a 1 -barrier of 
order 2. 

Another way to modify the HMM in Example 3.2 to 
enforce the assumptions of Lemma 3.1 is to change the 
emission probabilities. Namely, assume that the supports Gi, 
i = 1, ... ,4 are such that Pj(nf =1 Gi) > for all j e S, and 
(8) holds. Now, S = {1, . . . , 4} is the only cluster. Since the 
matrix P 2 has all its entries positive, the conditions of Lemma 
3.1 are now satisfied and barriers can now be constructed. 

IV. Proof of the main result 

A. Proof of Lemma 3.1 

The proof below is a rather direct construction which 
is, however, technically involved. In order to facilitate the 
exposition of this proof, we have divided it into 17 short parts 
as follows. 

1 ) Xi C X: It follows from the assumption (8) and 
finiteness of S that there exists an e > such that for all 

I G S Pi{X{) > 0, where 

Xi = {xeX: maxpli-O) < (1 - e)p* l f x {x) }. (10) 

(Note that p\ > for all i e S by irreducibility of Y .) Also 
note that X[,l G S are disjoint and have positive reference 
measure \(X[) > 0. 

2) Z C X and S— K bounds on cluster densities fi, i G C: 
Let C be a cluster as in the assumptions of the Lemma. The 
existence of C implies the existence of a set Z C n^gcGi 
and S > 0, such that X(Z) > 0, and Vz G Z, the following 
statements hold: 

(i) min ieC . fi(z) > S; 

(ii) maxj^c fj(z) = 0. 



Indeed, miiij e c Pji^iecGi) > implies (and indeed is 
equivalent to) \(P\i G cGi) > 0. The latter implies the exis- 
tence of Z C C^iecGi with positive A-measure and 8 > 
such that (i) holds. Since X(C\i^cGi) > 0, the condition 
Pj(r)i<zcGi) = for j $ C implies (is equivalent to) fj = 
X-almost everywhere on H^cGi- Thus, max^c fj = A- 
almost everywhere on D^cGi, which implies (ii). 

Evidently, K > can be chosen sufficiently large to make 
A({z G X : fi(z) > K}) arbitrarily small, and in particular, 
to guarantee that \({z G X : fi{z) > K}) < ^fp where 

|C| is the size of C. Clearly then, redefining Z = f Z n {z G 
X : fi(z) < K, i G C} preserves \{Z) > 0. Next, consider 

\(Z\(U leS Xi))- (ID 
If (11) is positive, then define 

Z d = Z\(Ui eS Xi). (12) 
If (11) is zero, then there must be s G C such that 

A(i n x s ) > 

and in this case, let 

z d = z n x s . (13) 

Such s G S must clearly exist since X(Z) > but 
A(Z\(U; £ 5 Xi)) = 0. To see that s must necessarily be in 
the cluster C, note Vs ^ C, f s (z) = Vz G Z, which implies 

z n x s = 0. 

3) Sequences s, a, and b of states in S: Let us define an 
auxiliary sequence of states qi, q-2, and so on, as follows: If 
(11) is zero, that is, if Z = Z n X s for some s G C, then 
define q\ = s, otherwise let qi be an arbitrary state in C. Let 
<72 be a state with maximal probability of transition to q\, i.e.: 
Vqi qi = P* qi Suppose q 2 ^qi- Then find q 3 with p qs q2 = p* 2 . 
If 93 ^ {qi, 92}, find q 4 : p q4 q3 = p* 3 , and so on. Let U be 
the first index such that qu G {qi, . . . , <7iy-i}, that is, qu = qT 
for some T < U. This means that there exists a sequence of 
states {qx, ■ ■ ■ ,qu} such that 

• It = qu 

> q T +i = arg max^ Pj qT+t _ 1 , i = 1, . . . , U - T. 
To simplify the notation and without loss of generality, assume 
qu = 1. Reorder and rename the states as follows: 

def def def 

Sl = qU-1, s 2 — qU-2, ■ ■ • i Si = qu-i, ■ ■ ■ , 

def -. . -. j- def TJ ~ 

sl = qr = 1 1 = 1, . . . ,L = U - T, 

def def def 

ai = 9r-i, a 2 = qr-2, ■ ■■ ,a P = q 1: 

def 

where P = T — 1. Hence, 

{qi, . . .,q T _ x ,q T ,q T+1 , . . .,qu-i,qu} = 
{a P , ...,a u l,s L _ 1 , ...,si, 1}. 

Note that if T — 1, then P = and {qi, qu-i,qu} = 
{1, Sl-i, ■ ■ ■ , si, 1}. We have thus introduced special se- 
quences a = (ai,a 2 , .. ., a P ) and s = (s 1; s 2 , Si_i, 1). 
Clearly, 

p Si -! Si =P* St , i = 2,...,L, p* Si = P i Sl 

Pa,i-i ai =P* ai , i = 2,...,P, p* ai =S L = 1. (14) 
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Next, we are going to exhibit b = (bi,...,bp), another 
auxiliary sequence for some R > 1, characterized as follows: 

(i) b R = 1; 

(ii) 3b eC such that p bo bl p bl & 2 • • ■p ba _ 1 ba > 0; 

(iii) if R > 1, then bi-\ ^ bi for every i = 1, . . . , R. 

Thus, the path bun connects cluster C to state 1 in R steps. 
Let us also require that R be minimum such. Clearly such 
b and bo do exist due to irreducibility of Y. Note also that 
minimality of R guarantees (iii) (in the special case of R = 1 
it may happen that b% = 1 6 S and pn > 0, in which case 
bo can be taken to be also 1). 

4) Determining k: Let Q m be the mth power of the sub- 
stochastic matrix Q = (pij)ij^c\ let qij be the entries of 
Q m . By the hypothesis of the Lemma, > Vi,j £ C. 
This means that for every i,j £ C, there exists a positive 
probability path from i to j of length to. Let q*j be the 
probability of a maximum probability path from i to j. In other 
words, for every i, j £ C, there exist states w\, . . . , if m _i £ C 
such that 

Piw\Pw\W2 ' ' ' Vw m — \w m — \Pw m — \j — Qij ^ 0- (15) 

Let us define 

q = min q\ ■ > 0, and (16) 

A = max max 1 — : p H > ol , (17) 
ies ies I,/;,, J 

where p*'s are as defined in (7). Choose k sufficiently large 
for the following to hold: 

A~ R , (18) 

where e is as in (10) and S and K are as introduced in §IV-A2. 

5) The s-path: We now fix the state sequence 

bo, &i, . . . , b R , si,s 2 , . . . , s 2 Lk, Oi, . . . , ap, (19) 

where Sij+i = s i> 3 = 1, . . ■ , 2fc — 1, i = 1, . . . ,L, (and 
in particular SLj = 1, j = l,...,2fc). The sequence (19) 
will be called the s-path. The s-path is a concatenation of 
2k s cycles Si : l, the beginning and the end of which are 
connected to the cluster C via positive probability paths b and 
a, respectively (recall that ap = qi £ C and bp = 1 by con- 
struction). Additionally, the bp, s\, s-2, ■ ■ ■ , s-2Lk, cti, ■ ■ ■ , ap- 
segment of the s-path (19) has the important property (14), 
i.e. every consecutive transition along this segment occurs with 
the maximal transition probability given its destination state. 
(However, b, the beginning of the s-path, need not satisfy 
this property.) The s-path is almost ready to serve as q\-M 
promised by the Lemma and its conversion to quM will be 
completed in §IV-A17. In fact, the idea of the Lemma and 
its proof is to exhibit (a cylinder subset of) observations such 
that once emitted along the s-path, these observations would 
trap the Viterbi backtracking so that the latter winds up on the 
s-path. That will guarantee that an observation corresponding 
to the beginning of the s-path, is a node. 



6) The barrier: Consider the following sequence of obser- 
vations 

z ,zi, ...,z m ,y'i,-- ■ ,2/^-1,2/0,2/1, ■ ■ ■ ,2/2Lfe, 
yi,---,Vp,z'i,---,z' m , (20) 

where 

zo,Zi,z[eZ, i = l,...,m; 
y'iEXb,, i = l,...,R-l; 

2/o S Xi, y l+Lj £ X Si , j = 1, . ..,2k- l,i = 1,.. .,L 
y'-eX ai , i = l,...,P. 

From this point on throughout §IV-A15, we shall be proving 
that yLk is a 1-node of order (kL + m + P), and, therefore, 
that (20) is a 1 -barrier of order (kL + m + P). 

First, let u > 2Lk + 2m + 1 + P + R and let x 1:u be any 
sequence of observations containing the sequence (20) in the 
tail. 

7) a, (3, 7, r\: Recall the definition of the scores S u (i) (1) 
and the maximum partial likelihoods pjv ] (u) (3). Now, we 
need to introduce the following abbreviated notation. For any 
i, j £ S and appropriate r > 0, let 

Si(yi) 8 u - P - m -2kL+l(i) V/ : < / < 2kL 
(M ) = f pV (u-P-m-2kL + l), (21) 
pg) (y>) d £ p W ( u - P - m - 2kL - R + I) VI: 
1<1<R-1, 

Si(zi) = 5 u -2Lk-2m-P-R+l(i) V/ : < / < TO, 

p§\zt) d = p§\u - 2Lk - 2m — P — R + l), 

5 t (z[) d = 6 u - m+ i(i) VI : 1 < / < to, 

P$M)^P$\u-m + t). (22) 

Also, we will be frequently using the scores corresponding to 
zo, y[, yLk, and y 2 Lk, hence the following further abbrevia- 
tions: 

oti d = Si(z ), ^ d = Si(z m ), 74 d = Si(y Q ), r)i d = Si(y Lk ). 

Note that Vj # C, f(z Q ) = f^) = f 3 ( Zl ) = 0, I = 1, . . . , to 
by construction of Z (§IV-A2). Hence, ay = fa = Vj £ C, 
and a more general implication is that for every j £ S 

fa =maxa i p\" l ~ 1 '(zo)fj(z m ) (23) 

= a i0ti)Pi™ti) j( z o)fi(z™) for some i p (j) £ C; 
jj = max/3 i p I ( f _1) (z m )/j-(y ) (24) 
= A 7 W)Pif(7) 1 ]( 2 m)/3'(2/o) for some i 7 (j) £ C. 

Also, we will use the following representation of to in terms 
of 7: 

Vj = max7,p^ L_1) (2/ )/j(2/fcL) (25) 
= 7i v ti)Pi*(j)j'(yo)fj(ykL) for some i v (j) £ S. 
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8) Bounds on (3: Recall (§IV-A3) that b G C. We show 
that for every j € S 

-if K V 



J) /3fc °- 



(26) 



Fix j G S and consider a ifj ^ from (23). Let v\, . . . , u m -i 
be a path that realizes pj™ fa). 

Then /3j = UigtiWhti) vjvi fa)p Vl v 2 fv 2 fa)--- 
Pv m ^ 1 jfj(z m ) < a i {j)K m . (The last inequality follows from 
(12), (13).) Let wi, . . . ,w m -x be a maximum probability 
path from ipij) to bo as in (15). Thus, 

— t ~ X ip(j)Pip(j) wi fw 2 fa) ■ ■ ■ 

■ ■ ■Ptu m - 1 b fb„(z m ) > a l/3{j) q5 m . 

(The last inequality again follows from (12), (13).) Since q > 
(16), we thus obtain: 



Pj < a l(Aj) K" 



< 



qS r 



-K Tl 



as required. 

9) Likelihood ratio bounds: We next prove the following 
claims 

p tl (yiL)<Pn '(yiL) 

Vie S VZ = 0,...,2Zc-l, (27) 



Pif 1) (y;L)/ J (2/(;+i)L) 



Pn 1) (yii)A(2/(z+i)L) 

Vi, j G S, j ^ 1,VZ : < Z < 2Zc - 1, (28) 

pg'^MMvo) < A R p[y\z m )h{y,) 

(29) 



< 1 



P 



(m+P-1). 



Pi 



(V2kL) _ X /K_\™ 

(m+p - 1 \y 2kL )- q ^ 



Vj e CVi e 5. 



(30) 



If L = 1, then (27) becomes p^ i < pi i for all i € S, which 
is true by the assumption p\ = pi i made in the course of 
constructing the s sequence (§IV-A3). If L = 1, then (28) 
becomes 

Pii/iW+i) 

and thus, since z/; + i £ A?i, < Z < 2k in this case, (28) is true 
by the definition of X\ (§IV-A1) (and the fact that p* = pi i). 
Let us next prove (27) and (28) for the case L > 1. Consider 
any I = 0,1, ... ,2k — 1. Note that the definitions of the s- 
path (19), X Si (§IV-A1), and the fact that y lL+i G X Si for 
1 < i < L imply that given observations yLi+i:L(i+i)-i> the 
path Si ; i_x realizes the maximum in p[\ (yti), i.e. 

Pn (yw) =pis 1 fs 1 (yiL+i)p sl s 2 --- (31) 

' ' •P*i-2*i-l/«i-l(y(J+l)£-l)P*E,-l I' 

(Indeed, p 1 Sl f Sl (yi L+1 )p Sl S2 ■ ■ ■ 

■ ■ ■Ps L -2S L - 1 fs L -l{y(l+l)L-l)Ps Ij - 1 1 = 

P* s Js 1 {yiL + l)p* 2 ■ ■ ■P* aL _Js L -- L {V{l+\)L-l)P*, 



and for i = 1,2, . . . ,L - 1, p* Si f Si (yiL+i) > Phjfj(yiL+i) for 
any Zi, j G 5.) Suppose j ^ 1 and realizes p\f~ 1 \yi L ), 

i.e. 



p.y L (j/ii) =Ptt 1 ft 1 (yiL+i)p tl t 2 



(32) 



• • ■Pt L - 2 t L - 1 ft L - 1 {y(i+i)L-i)pt L - l0 - 

Hence, with to and tj, standing for i and j, respectively (and 
so = «l = 1), the left-hand side of (28) becomes 



pt t 1 ft 1 (yiL+i) \(Pt 1 t 2 ft 2 (yiL+2) 



^Ps S1 fsi {yiL + l)' ^Psi s 2 fs 2 {yiL+2 

( PtL-itL-iftL-Ayy+yL-i) \ f Pt L ..i_t L fj{y{i+i)L) \ 

^Psl-2 SL-JsL-Ayy+^L-l) J \Ps L - X SLh(y{l+l)L)) 

For h = 1, . . . , L such that th ^ Sh, 

Pth-i t h ft h (yiL+h) 



(33) 



Ps h _! s h fs h (yiL+h) 



< 1 - e, since y ZL+/l G X Sh . 



(34) 



For all other h, Sh = th and therefore, the left-hand side of 
(34) becomes Elh^ilh. = Elh^Llh. < i (by property (14)). 

Since the last term of the product (33) above does satisfy (34) 
(j 1) ; (28) is thus proved. Suppose next that t\, . . . 
realizes Pif (yiL)- With s — 1 and to = i, similarly to the 
previous arguments, we have 



P 



a 1] (yiL) 



L-l 

n( 



Pt h ^ 1 t h ft h {yiL+h) \ PtL-i 



Pll X \yih) h=1 ^Psh-l s h fs h {yiL+h) ' Psi,„il 

implying (27). 



< 1, 



Let us now prove (29). To that end, note that for all 
states h, i, j € S such that pjh > 0, it follows from the 
definitions (7) and (17) that 



(35) 



— < < A. 

Pjh Pjh 

If R = 1, then (29) becomes 

Pijfj(yo) < Ap bol fi(y ). 

By the definition of X\ (recall that yo € X{), we have that for 
every i,j e 5 Pijfj(yo) < P*fi(vo)- Using (35) with Zi = 1 
and j = &o, we get Pi/i(y ) < ^P6 i/i(2/o) (P6 i > by 
the construction of b §IV-A3). Putting these all together, we 
obtain 

Pijfj(yo) <Pth(yo) < Ap bol fx(yo), as required. 

Consider the case R > 1. Let be a path that realizes 

Pif (Zm), i-e. plf" 1 ^^™) = 

Pitjtdyi)pti t 2 ft 2 (y' 2 ) ■■■Pt R-2 tR—l fta-i {y'R-l)Pt R -ij ■ 

By the definition of X\ (§IV-A1) and the facts that y' r G X br , 
r = 1, 2, . . . , R — 1, and yo £ <^i> we nave 



P 



Pb R -ifb R -dVii-i)Pih(yo)- (36) 
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Now, by the construction of b (§IV-A3), p ,— i b r > for r = 10) < const x j x : Combining (24), (26), and (29), we 
1, . . . , R, (bp = 1). Thus, the argument behind (35) applies see that for every state j G S, 
here to bound the right-hand side of (36) from above by 

J by (24) (R-l)f w / x 

ApbobJb 1 {yi) A Pb 1 b 2 !b 2 (y 2 )--- by (29) 

(W)). as re q uired - ^ g" 1 (y ) "V/3 fco ^f ~ 1} (z m )/i (2/0) 

Let us now prove (30). If m = 1 then (30) becomes TJ R (R-i), \ f , \ byj24) 

P^Wl) ^Wl)^ 1 VjGCVzeS. (37) 

where 

If P = 0, then (37) reduces to pij < pijq^ 1 which is true, dcf _ l (K 

because in this case the state q± = qr = 1 belongs to C 
(§IV-A3) and p^q' 1 > 1 ((15), (16) with m = 1). To see 
why (37) is true with P > 1, note that by the same argument 
as used for proving (27) and (28), we now get V/i, I G S 7j < Vj G 5. (43) 

Piop 1) (y2fei)/ap(j/p) >p\!jJ 1 \y2kL)fi{yp)- (38) 77) Further bounds on likelihoods: Let / > and n > 

_ x be integers such that / + n < 2k but arbitrary otherwise. 

Also, since^ ap^— q\ G (§IV A3), p ap jq > 1 ((15), (16) Expanding p^ ^(vil) recursively according with (4), we 



U^q-tt^) A R . (42) 



Hence 



with m = 1). Thus p\Ay2kh) = 



obtain 



^^f" l W)/l(»;)P«i P^" 1) W = . max .p^^/^+^x 



by (38) (p _ 1) 

< P lap (y 2fc L)/ ap (yp)maxp i , x p g r% (m)i )/ i2 (2/ a+2)i ) . . -p^Li^-a)*)* 

<p[ P a~ p 1) {y2kL)fa P (yp) X /i,-i(l/(i+»-l)i)C 1 l(!'((+»-l)^ (44) 



<p[ P a p 1) (y2kL)fa P (yp)Pa PJ q 1 < ^ p{*? {yikL)q l - Since for any h G 5, p { ^ 1] {vil) fi, {V{i+i)l) < 

For m > 1, let t 1:m _x be a path realizing p^ _1) (»p). Thus, Pi^' W)A(V(J+i)i). as well as 

P/ij (Vp) = (L-l) by (28) 

= PhtJtA z l)Pt 1 t 2 ft 2 {Z2)---ft m - 1 (z m -l)Pt m - 1 3 (L _ lh 

< K m - 1 . (39) P ll (y(l+r-l)L)fl(y(l+r)L), T = 2, . . . ,71 - 1, 

(This is true since z' G £ for r = 1, 2, . . . ,m - 1 (§IV-A2) and since for an Y *n-i G 5 

and thus, for p^ -1 (j/p) to be positive it is necessary that by (27) 

*r G C, r = l,...,m-l, implying f u (z' r ) < K.) pt^\{y(i+n-i)L) < Pii(y(l+n-i)L), 

Now, let ti m.-i realize pi^T (fp)> which is clearly pos- 

itive, with tr G C, r = l,...,m - 1 (z' r G Z for maximization (44) above is achieved as 

r = 1, 2, . . . , m — 1), and ap, j E C (recall the positivity „ „ (nL-i), ■. ,. cs 
„ ( m -n, »x follows: pi , (vil) = (45) 

assumption on Q m , §IV-A4). We thus have Pa pj '(y P ) = fi _ n 

> q*a pj ftM)ftM) ■ ■ ■ ft m -Az' m -x) > QS m -\ (40) ' ' •pi L r 1) (2/(/+n-2)L)A(y(i+n^l)L)pi L r 1) (2/(;+n-l)L)- 

Combining the bounds of (39) and (40) (q > 0, (16)), we Now, we replace state 1 by generic states i,j G S on the both 

obtain: ends of the paths in (44) and repeat the above arguments. Thus, 

/ TC \ in — l also using (45), we arrive at bound (46) below: 

pt}- l \yp)<pt?\yr){j) /«■ 



p^ L 1] {yiL)fj{y(i+ n )L) < 



Finally, p\f +m l \y2kL)= l+n 

[ Pn _1) (y(«-i)L)/i(2/uL) by = 5) 



u=l+l 



maxpli (V2kL)Myp)pij (Vp) 
by (38), (4i) (P _ X) „ (m _!) „ (Ky 1 - 1 Pri L ~ l) {yiL)h{y{i +n )L) Vi.jGS. (46) 



< PL. {y2kL)f ap {y P )p\ pj (y P ) [ ~F ) /l 



Lap wiku/jup vsr/rap j 



In particular, (46) states \fi,j G 5 



< Pir m_1) (^^) ( T ) /«■ Pr _1) (2/o)/i(^) < pif ^(iftJ/xdttL). (47) 
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12) r)j < const x 771; In order to see 
r)j < Uiji Vj e S, 



(48) 



(25) (kL-1) 

note: 77., = max7,-p;; 



ivo) SjiVkL) 



by (47) 



< max7^f 1] {yo)hi.ykL) < 

by (43) ( k T-\\, , , , b y < 25 > 

13) A representation of r\\: Recall that k, the number of 
cycles in the s-path, was chosen sufficiently large for (18) to 
hold (in particular, k > 1). We now prove that there exists 
ree {!,...,& — 1} such that 



by (43) 



in = Si(vkl)p[ 1 k)l 1] {y K L)h{ykL)- 



(49) 



The relation (49) states that (given observations x\. u ) a 
maximum-likelihood path (from time 1, observation x\) to 
time u — m — P — kL (observation ykL) goes through state 1 
at time u — m — P — 2kL + kL, that is when y K L is observed. 

To see this, suppose no such n existed. Then, applying (4) 
to (25) and recalling that 8\(y K L) is introduced in (21), we 
would have 



m = 7j,(i)pJ(i) 1 i- 1 (yo)/j 1 (yi,)^ 2 ^(yi)x 

x fj2(y2L)p^J 3 (v2l) ■ ■ -f^~ i(y(fc-i)i)/i(yfei) 

for some j\ 7^ 1, ■ • ■ , jk-i 7^ !• Furthermore, this would imply 

7/1 < 

7j„(i 



by (28), (27) , rr n 

< 7._(i ) (i- £ ) fe - i n^i '(^-dl)/!^) 

i=l 



by (18) „ 



A" 



2m 



by (43) 



i=l 
A' 



1 = 1 



2) 7i 9 ( 77 ) Il p i L i ^(^(i-l)^)/!^) 



A" 



i=l 



<7in p H (f(»-X)i)/l(j/ti)- 



(50) 



(The last inequality follows from q < 1 (16) and 8 < K, 
§IV-A2.) On the other hand, by definition (25) (and k - 1-fold 
application of (4)), rji > 71 fjLi P11 {V(i-i)L)fl (SKl)i 
which evidently contradicts (50) above. Therefore, k satisfying 
(49) and 1 < k < k, does exist. 

74) A« implication of (45) a«c/ (49) for 8i(yih): Clearly, 
the arguments of the previous section (§IV-A13) are valid if 
k is replaced by any I £ {k, . . . , 2k}. Hence the following 
generalization of (49): For some n(l) < I 

h{yiL) = *i(y ( ,(i)£)pi?"' ,(0)i " 1) (tf ( .(0£)A(wi)- (51) 

We apply (51) recursively, starting with = I and returning 
K (i) 4g f < - 1 if K (i) < ^ we gj-Qp^ otherwise we substitute 



k' 1 ) for I, and obtain «( 2 ) ^ f k(1) < and so, on until 
K^> < k for some j > 0. Thus, 8\(yii,) = 

= fli(y K «)i)Pu {y K u->L)fi{y K u-^L) ■ ■ ■ 

p[ { !~ Kil>)L - 1] (y^L)h(yiL). (52) 

Applying (45) to the appropriate factors of the right-hand side 
of (52) above, we obtain: 

8i(vil) = '5i(i/„( 3 )l)Pu" 1) (!/ k o)l)/i(!/(kM+i)l) ■ ■ ■ 

Prr l \y{k~i)L)fi{ykL) ■ ■ ■Pn~ 1 \ykL)h(y(k+i)L) ■ ■ ■ 

Pii _1) (2/( K u-i)-i)L)/i(y K u-DL) • • • 
P11 '(2/(k(D-i)l)/i(2/k(i)l) ■ ' ■ 

A^iya-D^hiyiL). (53) 

Also, according to (45), 

(S i(y K «)L)pii~ 1) (y K u)L)/i(y( K (i)+i)L) • • • 

Pn (y(fe-i)Lj = 8i{y K u) L )p\\ (y K u)L)- 
At the same time, 

8i(y K u )L )p [ lt KU))L ' 1 \y^ L )h{ykL) by < (4> vi- (54) 

However, we cannot have the strict inequality in (54) 
above since that, by virtue of (53), would contradict max- 
imality of 8\(yiL). We have thus arrived at 8\(yiL) = 
ViP\i~ 1] XVkL)fi{y{k+i)L) ■ ■ ■ 



■Pii 1] {y(i-i)L)fi{yiL)- 



(55) 



In summary, for any I > k and I < 2k there exists a 
realization of 8\(yiL) that goes through state 1 every time 
when yiL, i = k, . . . , I, is observed. 

15) ykL is a (kL + m + P)-order 1-node: In §IV-A16, we 
will prove that for any i € S, i ^ 1, and any j £ C, 

(kL+rn+P-l) , n . (kL+m+P-1) , n f<i -\ 

mPij {ykL)<vip\ 3 {y^), (56) 

which implies that y^L is a 1-node of order kL + m + P. 
Indeed, let I £ S be arbitrary. Since fj(z' m ) = for every 
j £ S \ C, any maximum likelihood path to state I at time 
u + 1 (observation 17+1) must go through a state in C at time 
u (observation x u = z' m .) Formally, 



VtP 



(kL+m+P) 



1 1 



by (56) 



0/ 



kL) 



(kL+m+P-1) 
(kL+m+P-1) 

max rjipl 



(ykL)fj{z' m )pji 
{ykL)fj{z' m )Pji 



(kL+m+P-1) , \t I I \ 

< max TKPh '{ykL) fj (z m )pji 

jec J 

by (4) (kL+m+P), > 

mp\i {ykL). 

Therefore, by Definition 2.1 ykL is a 1-node of order kL 
m + P. 
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16) Proof of (56): Let i G S and j G C be arbitrary. Let Hence, for every j' G S, 



state j* G S be such that p[f +m+P 1> {y k L) = 

(kL-l), w / \ (m+P-1)/ \ 

Plj* {ykL)fj'{y2kL)p). j '(y2kL) = 

v(iJ*)pfij~ P (V2kL), where 

= P l t J L ' 1 \ykL)f 3 {y2kL), for all i,j G S. 
We consider the following two cases separately: 

1. There exists a path realizing p\ k ^ 1 (j/^l) and going 
through state 1 at the time of observing yiz for some 



IG{A,...,2A}. P^W) 



p «l-k)L-l) i 



{ykL)h{yiL)P^- l)L - 1 \yiL)- (57) 

Equation (57) above together with the fundamental recur- 
sion (4) yields the following: 

r?iPh-» {ykL) = 



by_(57) ((Z-fc)i-l), 



wir ,t;x '" i '(wL)/i(wi)pi?. fc " I)x '" 1) (wi) 

by (21), (4) {{2 k-l)L-l), 

< 0iWl)Pij (ya)- (58) 

At the same time, the right hand-side of (58) can be 
expressed as follows: 



b y_l 55 ) Jil-k)L-X) ( s f( v U « 

— ?7iPn {ykL)h{yiL)Pij* 

by (45) \ 



((2fc-;)L-i) 
J* 



(59) 



Therefore, if there exists I G {fc, . . . , 2k} such that 
(57) holds, we have by virtue of (58) and (59): 

(kL-l), x ^ (kL-l), n . . 

ViPlj* iVkL) < ViPij. [ykL), that is 

<Vi"(l,f)- (60) 



„ (kL+m+P-1), \ 

Hence, r/ip> ,. (y fe L ) 



byJ57) 
by (60) 



/• -*\ (m+P— 1)/ \ 

Viv{hJ )p)*i \y2kh) 



;*N (m + P-1) 



< r)iu(l,j*)p rj 

by < (4) ^^( fci + m + p - 1 : 



(y2kh) 



(y 



kL) 



VlPij 

and (56) holds. 

2. Assume now that no path exists to satisfy (57). Argue as 
for (50) to obtain < 

2k 

(l-e)*- 1 [] p[ L i^\y(n-i)L)h{ynL). (61) 

n=k+l 

By 45, the (partial likelihood) product in the right-hand 
side of (61) equals v(l, 1). Thus, 
... ( m+ p_i) 



riiv{i,3*)p)* j '{y%kL)< 



(62) 



(m+P-1) 



by (61) , , 

< m{i-£) k V(l,ijpv.y- 

hy < m) ^ 2 ^) 2m A-MiA)Ppr-%^) 

by (42), (48) / X \ m , , „ „ 

< •&«(-) ki,i)p?; ^W). 



/. ./s (m+P-1)/ \ 

mvyi,3)p)'i [y2kL) 



by (57) 
< 



rh^i,3*)ppl P - 1 \y2kL) < 



by (62) 



by (62) 

< mi 



by , (4) (fei+m+P-l)/ \ 
< Wij (j/fci), 

which, by virtue of (4), implies (56). 

17) Completion of the s-path to qi-M and conclusion: 
Finally, let 

M = 2m + 2Lk + P + R + 2, r = kL + P + m, I = 1. 

Recall from §IV-A3 that b G C. Since all the entries of Q m 
are positive, there exists a path Wo:m-ij^o G C such that 
PviV i+1 > and p Um _ 1 6 Q > 0. Similarly, there must exist 
a path Ui.-m G C such that p Ui Ui+1 > Vi = 1, . . . , m — 1 
and p ap „j > (recall that ap G C). Hence, by these, and the 
constructions of §IV-A5, all of the transitions of the following 
sequence occur with positive probabilities. 



del 



Ql:M = (vo:m-l,bo:R,Si : 2Lk,ai:P,Ul:m)- 



(63) 



Clearly, the actual probability of observing q\-M is positive, as 
required. By the constructions of §§IV-A1-IV-A3, the condi- 
tional probability of B below, given q\,M, is evidently positive, 
as required. 

B d = Z m+l X X bl X • • • X X hR _ x X X x X X S1 x 

' ' ' x -^2^-1 X Afi X Af 0l X • ■ • x X ap x Z m . 

Finally, since the sequence (20) below was chosen from B 
arbitrarily (§IV-A6) and has been shown to be an Z-barrier of 
order r, this completes the proof of the Lemma. 



{zO:m,y'l:R-l,yO;2Lk,yi :P ,z' Um ) G B. 



(20) 



B. Proof of Lemma 3.2 

Proof: We use the notation of the previous proof in 
§IV-A and consider the following two distinct situations: First 
(§IV-B1), all barriers from B as constructed in the proof 
of Lemma 3.2 are already separated. Obviously, there is 
nothing to do in this case. The second situation (§IV-B2) 
is complementary, in which case a simple extension will 
immediately ensure separation. 

1) All y G B are already separated: Recall the definition 
of Z from §IV-A2. Consider the two cases in the definition 
separately. First, suppose Z = Z\(U; e sA/), in which case Z 
and Xi are disjoint for every I G S. This implies that every 
barrier (20) is already separated. Indeed, for any w, 1 < w < 
r, and for any y G B, the fact that 2/M-max(m,u0 for 
example, makes it impossible for (y[. w , yi-.M-w) G B for any 
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y[. w £ X w . Consider now the case when Z = Z n X s for 
some s £ C. Then 

B c Xl n+1 x X bl x • • • x Xb^, x X x x X Sl x ■ ■ ■ 

X S2kL -i xX 1 xX ai x---x X ap _, x X™ +1 . (64) 

Let y £ B be arbitrary. Assume first L > 1. By construction 
(§IV-A3), the states s 1; . . . , sl are all distinct. We now show 
that (y' 1:w , yuM-w) & B for any y' Vw £ X w when 1 < w < r. 
Note that the sequence 

Qm+2:m+R+2kL+P+l = j 1 1 s l:2feL- 1 j 1 , a l:P- 1 j s ) 

is such that no two consecutive states are equal. It is straight- 
forward to verify that there exist indices j, < j < m — 1, 
such that, when shifted w positions to the right, the pair 
yj+ij+2 £ Xg would at the same time have to belong to 
Xqj+i+v, x X q,+2+u 1 witn m + l<j + l + w<j + 2 + w< 
m + R + 2kL + 1 + P. This is clearly a contradiction since 
Xq j+1+W and X qj+2+w are disjoint for that range of indices j. 
A verification of the above fact simply amounts to verifying 
that the inequality max(0, m — w) <j< min(m — 1, m+R+ 
2kL — 1 + P—w) is consistent for any w from the admissible 
range: 

i. ) When > m - w, m - 1 < m + R + 2kL -1 + P-w 

(m < w < min(r, R + 2kL + P)), < j < m — 1 is 
evidently consistent. 

ii. ) When > m — w, m — 1 > rn + R + 2kL — 1 + P — w 

(max(m, R + 2kL + P) < w < r), < j < m + R + 
2kL — 1 + P—wis also consistent since m + R + 2kL — 
l+P-r = R + kL-l>0. 

iii. ) When <m-w, m-l <m + R + 2kL -1 + P-w 

(1 < w < min(m-l, R+2kL+P)), m-w < j < m-l 
is consistent since w > 1. 

iv. ) When < m — w, m—1 > rn + R + 2kL —1 + P — w 

(max(l, R+2kL + P — 1) < w < m), m — w < j < m + 
R+2kL — 1 + P—w is consistent since R+2kL — l > 0. 

Next consider the case of L = 1 but 1 (that is, P > 0). 
Then B C X™ +1 x X bl x ■ ■ ■ 

xX bR _ t x X^ k+1 xX ai x---x X ap _ x x X™ +1 . 

ff s ^ 1, then also 6j ^ 1, i = 1, . . . , R — 1 and ^ 1, 
i = 1, ... ,P — 1. To see that y is separated in this case, 
simply note that yM- m ax{w.m+i) ^ X s for any admissible w. 

2) Barriers y £ B need not be separated: Finally, we 
consider the case when L = 1 and s = 1 (where s £ C 
is such that Z = Z n X s ). This implies that P = 0, 1 £ C, 
and pi i > 0, which in turn implies that R = 1, and 

B C X™ +1 x Xl k+1 x X™ +1 = x? m+2k+ ' 3 . 

Clearly, the barriers from B need not be, and indeed, are not 
separated. It is, however, easy to extend them to separated 
ones. Indeed, let qo ^ 1 be such that p qo i > and redefine 

dcf 

B = X qo x B. Evidently, any shift of any y £ B by w 
(1 < w < r) positions to the right makes it impossible for y\ 
to be simultaneously in X qa and in X\ (since the latter sets 
are disjoint, §IV-A1). ■ 



V. Conclusion 

As discussed in §1 and §I-A in particular, the proper infinite 
alignments (§II-B) allow us to define the decoding process 
V which is regenerative and can further be stationarized to 
become ergodic [7]. This in turn allows us to study the 
distribution and asymptotic properties not only of the Viterbi 
process V but also of the joint process (X, V). In particular, 
this reveals how different these properties are from the proper- 
ties of the underlying chain Y and HMM (X, Y), respectively. 
More specifically, since the process V (resp. (X, V)) can 
deviate from the process Y (resp. (X, Y)) significantly, using 
the Viterbi alignments v\ in as estimates for the hidden paths 
Y 1:n might lead to incorrect conclusions not only for finite n 
(as generally appreciated) but also in the limit as n — > oo [7]. 

This certainly does not mean that one should not make 
inference based on V but simply suggests that the afore- 
mentioned differences may need to be taken into account. 
One example of how these asymptotic differences can be 
successfully accounted for is the adjusted Viterbi training for 
HMM parameter estimation [11], [12], [7]. 

If known — possibly estimated — these differences might 
also be appreciated when the Viterbi paths are used for predic- 
tion, or segmentation, of Y, e.g. in speech segmentation or in 
segmentation of DNA sequences into coding and non-coding 
regions, or in detection of CpG islands in DNA sequences [15]. 
Indeed, in segmentation of DNA sequences, the underlying 
chain Y has few, often two, states (e.g. coding and non- 
coding regions, or CpG islands and non-CpG regions), the 
probabilities of transitions between the states are very low, 
hence the true (Y) and predicted (V) hidden paths consist 
of long constant blocks. At the same time, it has been noted 
that the predicted constant blocks can be somewhat longer 
than what the chain parameters would suggest. With the help 
of the infinite Viterbi process V it is now clear that this 
discrepancy is not simply due to the random fluctuations but 
is systematic, does not vanish asymptotically, and is a direct 
consequence of that the transition probabilities of V do indeed 
often underestimate the true ones. Note that in these examples, 
unlike in the estimation of the HMM emission parameters, the 
overall performance is directly linked to the accuracy of the 
transition probability estimates. Thus, finding the differences 
between the processes (X, Y) and (X, V) in this case might 
help find better alignments. 
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